Word-Level Confidence Estimation for Machine Translation

نویسندگان

  • Nicola Ueffing
  • Hermann Ney
چکیده

This article introduces and evaluates several different word-level confidence measures for machine translation. These measures provide a method for labeling each word in an automatically generated translation as correct or incorrect. All approaches to confidence estimation presented here are based on word posterior probabilities. Different concepts of word posterior probabilities as well as different ways of calculating them will be introduced and compared. They can be divided into two categories: System-based methods that explore knowledge provided by the translation system that generated the translations, and direct methods that are independent of the translation system. The system-based techniques make use of system output, such as word graphs or N-best lists. The word posterior probability is determined by summing the probabilities of the sentences in the translation hypothesis space that contains the target word. The direct confidence measures take other knowledge sources, such as word or phrase lexica, into account. They can be applied to output from nonstatistical machine translation systems as well. Experimental assessment of the different confidence measures on various translation tasks and in several language pairs will be presented. Moreover, the application of confidence measures for rescoring of translation hypotheses will be investigated.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models

Confidence measures for machine translation is a method for labeling each word in an automatically generated translation as correct or incorrect. In this paper, we will present a new approach to confidence estimation which has the advantage that it does not rely on system output such as N best lists or word graphs as many other confidence measures do. It is, thus, applicable to any kind of mach...

متن کامل

An Open Source Toolkit for Word-level Confidence Estimation in Machine Translation

Recently, a growing need of Confidence Estimation (CE) for Statistical Machine Translation (SMT) systems in Computer Aided Translation (CAT), was observed. However, most of the CE toolkits are optimized for a single target language (mainly English) and, as far as we know, none of them are dedicated to this specific task and freely available. This paper presents an open-source toolkit for predic...

متن کامل

Word Confidence Estimation for Speech Translation

Word Confidence Estimation (WCE) for machine translation (MT) or automatic speech recognition (ASR) consists in judging each word in the (MT or ASR) hypothesis as correct or incorrect by tagging it with an appropriate label. In the past, this task has been treated separately in ASR or MT contexts and we propose here a joint estimation of word confidence for a spoken language translation (SLT) t...

متن کامل

Application of Word-Level Confidence Measures in Interactive Statistical Machine Translation

In this paper, we will address the question of how to efficiently integrate word confidence measures into a state-of-the-art interactive statistical machine translation system and improve prediction performance. Different methods will be presented: the selection of words according to their confidence as well as the rejection which has not been investigated so far. Experimental evaluation with r...

متن کامل

Error Detection for Statistical Machine Translation Using Linguistic Features

Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from N best lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation syste...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2007